AI coding benchmarks AI News List | Blockchain.News
AI News List

List of AI News about AI coding benchmarks

Time Details
2025-10-14
02:59
Claude Sonnet 4.5 Launches with Variable Reasoning Token Budget, 1M Token Context, and Advanced Coding Features for AI Developers

According to DeepLearning.AI, Anthropic has released Claude Sonnet 4.5, introducing a variable reasoning-token budget and supporting larger input contexts ranging from 200,000 up to 1 million tokens. This update demonstrates improved performance on multiple coding and reasoning benchmarks, making it attractive for enterprise AI applications and complex coding workflows. The model is available for free online and via API at competitive rates of $3 per million input tokens and $15 per million output tokens (source: DeepLearning.AI, 2025-10-14). Anthropic also launched a Claude Agent SDK and updated Claude Code with features like automatic context tracking and summarization, a persistent memory tool, checkpoints for safe rollbacks, and a Visual Studio Code compatible IDE extension. These enhancements offer developers robust tools for building scalable, context-aware AI agents, significantly improving workflow automation and enterprise software development (source: DeepLearning.AI, 2025-10-14).

Source
2025-06-05
19:26
Gemini 2.5 Pro Preview Delivers +24 LMArena Elo, Outperforming in Coding, Science, and AI Reasoning Benchmarks

According to Oriol Vinyals (@OriolVinyalsML), Google has introduced the Gemini 2.5 Pro preview, demonstrating a significant +24 improvement in LMArena Elo score over its previous version. The model leads industry benchmarks in advanced coding tasks (AIME, AIDER), science problem solving (GPQA), and complex reasoning (HLE), outperforming competitors in practical AI applications. Enhanced style and structure, informed by user feedback, make Gemini 2.5 Pro a compelling choice for businesses seeking robust generative AI solutions in software development, scientific research, and advanced analytics (Source: @OriolVinyalsML, Twitter, June 5, 2025).

Source